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DETAILED ACTION 

1. This communication is in response to the Arguments filed on 9/18/2007. Claims 
1-33 remain pending and have been examined. Furthermore, the IDS was not 

* 

considered as it fails to provide an English translation of the Abstract for the foreign 
patents. All objections and rejections that have not been addressed by the Examiner 
have been withdrawn. 

Change of Art Units 

2. It should be noted that the Examiner has changed art units from 2609 to 2626. 

Information Disclosure Statement 

3. The information disclosure statement filed 1 1/26/2003 and 05/1 0/2004 fails to 
comply with 37 CFR 1 .98(a)(3) because it does not include a concise explanation 
(Abstract that has been translated, see below) of the relevance, as it is presently 
understood by the individual designated in 37 CFR 1.56(c) most knowledgeable about 
the content of the information, of each patent listed that is not in the English language. 
It has been placed in the application file, but the information referred to therein has not 
been considered. The information disclosure statement contains foreign patents for 
which no translation of the abstract is provided. 

Response to Arguments 

4. Applicant's arguments, see pages 10-18, filed 09/18/2007, with respect to the 
rejection(s) of claim(s) 1 and 18 under Janiszewski et al. in view of Ching et al. have 
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been fully considered and are persuasive. Therefore, the rejection has been withdrawn. 
However, upon further consideration, a new ground(s) of rejection is made in view of 
Kushner in view of Durtach et al. 

* * 

Claim Rejections - 35 USC § 103 

5. The following Is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

6. Claims 1, 2, 18, 19 is rejected under 35 U.S.C. 103(a) as being unpatentable 
over Kushner et al. (US 6,862,567) in view of Durlach et al. (US 5,828,997) in view 
Eryilmaz (US 5,867,574). 

As to claims 1 and 18, Kushner etal. teaches a voice region detection apparatus, 
comprising: 

a preprocessing unit for dividing an input voice signal into frames (see 
col.4, lines 6-7, segments acquisition window into frames,); 

a frame state determination unit for classifying the frames into voice 
frames and noise frames (see col. 4, lines 29-37, speech/noise classifier done by 
microprocessor 110) based on the random parameters extracted by the random 
parameter extraction unit; and 

« 

a voice region detection unit (see col. 5, lines 9-14, microprocessor 110) 
determines the starting point and ending point of the speech utterance.) for 



Application/Control Number: 1 0/721 ,271 Page 4 

Art Unit: 2626 

detecting a voice region by calculating start (see col. 6, lines 8-9 and Figure 2, 
microprocessor 110 determines the starting point and ending point of the speech 
utterance) and end positions of a voice based (see col. 6, lines 65-66 and Figure 
2, endpoint is determined) on the voice and noise frames input from the frame 
state determination unit (e.g. From the determination of a speech utterance the 
voice regions are detected based on energy.). 

However, Kushner et al. does not specifically teach the whitening unit for 
combining white noise to the input frames. 

Durlach et al. does teach the whitening unit combining white noise to the 
input frames (see col. 5, lines 56-65 and Figure 2, target signals (speech) 50a, 
50b, and 50n are added with the noise generator 60 by mixer 56). Although white 
noise is not used when adding to the target signals, it would have been obvious 
to add white noise to a signal or any other type of noise depending on 
environment simulated. 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the voice recognition as taught by 
Kushner et al. with the addition of a whitening unit as taught by Durlach et al. The 
motivation to have combined the references involve the ability to incorporate the 
directionality of a signal for sound localization (see Durlach et al., col. 5, lines 60- 
65) as would benefit the preprocessed signal from Kushner et al. for real-time 
environmental simulation. 
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However, Kushner et al. in view of Durlach et aL do not specifically teach 
the random parameter extraction unit. 

Eryilmaz does teach the random parameter extraction unit (see col. 4, 
lines 30-42, speech energy value determined) for extracting random parameters 
indicating the randomness of frames (see col. 2, lines 30-55, voice activity is 
detected based on the comparison to a threshold. The determination of energy is 
random since it is not known whether the frame is voice or noise. Further, the 
randomness is addressed by indicating the noise or voice present in the signal). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the voice recognition as taught by 
Kushner et aL in view of Durlach et al. with the addition of random parameter 
extraction unit as taught by Eriyimaz for the purpose of classifying the voice 
frames and noise frames as taught by Kushner ef a/, in view of Durlach et aL by a 
method for classification. 

As to claim 2 and 19, Kushner et aL in view of Durlach ef aL in view of Eryilmaz 
teach all of the limitations as in claims 1 and 18 above. 

Furthermore, Kushner et aL teaches wherein the preprocessing unit 
samples the input voice signal according to a predetermined frequency (see col. 
3, lines 66-col. 4, lines 10, digitize), and divides the sampled voice signal into a 
plurality of frames (see col. 4, lines 4-10, segmentation into frames is performed.) 
(e.g. The digitization of the voice signal makes the use of sampling frequency 
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obvious as the signal is sent to the microprocessor for further processing. It is 
obvious that this sampling frequency is utilizing the Nyquist criterion.) 

As to claim 4 and 21 , Kushner et ai in view of Durlach et aL in view of Eryilmaz 
teach all of the limitations as in claims 1 and 18 above. 

Furthermore, Durlach etaL teaches wherein the whitening unit comprises 
a white noise generation unit (see Figure 2, noise generator 60) for generating 
the white noise, and a signal synthesizing unit (see Figure 2, mixer 56) for 
combining the frames input from the preprocessing unit (see signals 50a, 50b, 
and 50n) with the white noise generated by the white noise generation unit (e.g. 
Noise is added to the target signal.). 

r 

7. Claims 3 and 20 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Kushner et al. in view of Durlach et al. in view of Eryilmaz as applied to claims 2 and 19 
above, and further in view of Mekuria (US 6,182,035). 

As to claims 3 and 20, Kushner et al. in view of Durtach et aL in view of Eryilmaz 
teach all of the limitations as in claims 2 and 19 above. 

However, Kushner et aL in view of Durtach et aL in view of Eryilmaz do not 
specifically teach the frames overlapping with one another. 

Mekuria does teach the overiapping of frames (see col. 8, lines 28-29). 
It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have combined voice recognition as taught by 



1 
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Kushner et aL in view of Durlach et ai in view of Eryilmaz with the overlapping of 
frames as taught by Mekuria. The motivation to have combined the references 
involves the use of samples in more than one frame (see Mekuria col. 8, lines 28- 
29). 

8. Claims 5 and 22 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Kushner et aL in view of Durlach et ai in view of Eryilmaz as applied to claims 1 and 18 
above, and further in view of Davis et a/. (US 2003/0216909). 

As to clairn 5, Kushner et aL in view of Durlach et aL in view of Eryilmaz teach all 
of the limitations as in claim 1 and 18 above. 

Furthermore, Durlach et aL does teach the whitening unit combining white 
noise to the input frames (see col. 5, lines 56-65 and Figure 2, target signals . 
(speech) 50a, 50b, and 50n are added with the noise generator 60 by mixer 56). 

Furthermore, Eryilmaz does teach the random parameter extraction unit 
(see col. 4, lines 30-42, speech energy value determined). 

However, Kushner et aL in view of Durlach et aL in view of Eryilmaz do not 
specifically teach wherein the calculation of the numbers of runs (see [0046], 
consecutive frames that fulfill the energy threshold. 

Davis does teach wherein the calculation of the numbers of runs (see 
[0046], consecutive frames that fulfill the energy threshold.( If x number of frames 
meet the requirement than it is determined that speech is present, otherwise 
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noise or non-speecli is present.) consisting of consecutive identical elements in 
the frames. 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have combined voice recognition as taught by 
Kushner et aL in view of Durlach et aL in view of Eryilmaz with consecutive runs 
of the random parameter as taught by Davis et aL The motivation to have 
combined the references involves the ability for the VAD processor to produce 
the chance for short-term events triggering a VAD state change (see Davis et al. 
[0046]). 

9. Claims 7-9 and 24-26 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Kushner aL in view of Durlach et aL in view of Eryilmaz as applied to claims 1 
and 18 above, and further in view of Pastor (US 5,572,623). 

* 

As to claims 7 and 24, Kushner et aL in view of Durlach et aL in view of Eryilmaz 
teach all of the limitations as in claims 1 and 18, above. 

However, Kushner et aL in yiew of Durlach et aL in view of Eryilmaz do not 
specifically teach the voice frames including vocal frames and fricative frames. 

Pastor does teach the frames including vocal and fricative frames (see col. 
4, lines 66-67 and col. 5, lines 5-14). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have combined the voice recognition as taught by 
Kushner et aL in view of Durlach et aL in view of Eryilmaz with the inclusion of 
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fricative frames as taught by Pastor. The motivation to have combined the 

■ 

references involves the inclusion of fricatives that are present in at the start and 
end of speech (see Pastor col. 1, lines 29-33). 

As to claims 8, 9, 25, and 26, Kushner et ai in view of Durlach et aL in 
view of Eryilmaz in view of Pastor teach all of the limitations as in claims 7 and 
24, above. 

Furthermore, Eryilmaz teaches wherein the frame state determination unit 
(e.g. voice activity detector) determines if the random parameter of a frame 
extracted by is below a first threshold (see col. 2, lines 30-55, voice activity is 
detected based on the comparison to a threshold.) The determination of energy 
is random since it is not known whether the frame is voice or noise.) then it is a 
vocal frame (e.g. If the noise is below the value of the threshold, then speech is 
present or vocal frame. The use of a specific threshold would have been obvious 
to one skilled in the art in order to distinguish voice from noise. Hence, the use of 
below or above a threshold is matter of design choice and relativity. The 
Applicants do not indicate reasons for selecting the stated thresholds (see 
Applicant's Specification, page 11, lines 17-21). 

10. Claims 10, 1 1 , 27, and 28 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Kushner et al, in view of Durlach et aL in view of Eryilmaz in view of 



ft 
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Pastor as applied to claims 8 and 25 above, and further in view of Chong-White et al. 
(US 7,065,485). 

As to claims 10, 1 1 , 27, and 28, Kushner et al. in view of Durlach et al. in 
view of Eryilmaz in view of Pastor teach all of the limitations as in claims 8 and 
25, above. 

However, Kushner ef al. in view of Durlach et al. in view of Eryilmaz in 
view of Pastor do not specifically teach if the random parameter of a frame 
extracted by the random parameter extraction unit is above a second threshold, 
the relevant frame is a fricative frame. 

Chong-White et al. does teach if the random parameter of a frame 
extracted by the random parameter extraction unit (see col. 7, lines 22-25, 
energy ratio computed, similar to Eriylimaz) is above a second threshold (see col. 
7, lines 46-47, fricatives identified when above a threshold), the relevant frame is 
a fricative frame. As to claims 1 1 and 28, it would have been obvious to select a 
threshold value for comparing different types of values of a signal with respect to 
a ration. 

It would have been obvious to one of ordinary skilled in the art at the time 

» 

the invention was made to have combined the voice recognition as taught by 
Kushner et al. in view of Durlach et al. in view of Eryilmaz in view of Pastor with 
the inclusion of a threshold indicating a fricative as taught by Chong-White et al. 
The motivation to have combined the references involves the ability to detect 
further unvoiced components in a signal consisting of speech and non-speech. 
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Furthermore, the use of the voice recognition as taught by Kushner et aL in view 
of Durlach et aL in view of Eryilmaz in view of Pastor allows the ability to detect 
noise, voice and fricatives contained in the signal. 

As to claims 12, 13, 29, and 30, Kushner et al. in view of Durlach et aL in 
view of Eryilmaz in view of Pastor in view of Chong-White teach all of the 
limitations as in claims 8 and 25, above. 

Furthermore, Eryilmaz teaches wherein the frame state determination unit 
determines that if the random parameter of the frame extracted by the random 
parameter extraction unit is below the second threshold, the relevant frame is a 
noise frame (see col. 2, lines 30-55, voice activity is detected based on the 
comparison to a threshold. The determination of energy is random since it is not 
* known whether the frame is voice or noise). 

However, Eryilmaz does not specifically teach the use of two thresholds 
for comparison. 

It would have been obvious to use multiple thresholds for classifying each 
frame so that the detection of voice and fricative frames can be detected as 

■ 

taught by Chong-White above. Further, the values for the thresholds used are a 
matter of design choice based on the thresholds computed. 

1 1 . Claims 14 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Kushner et al. in view of Durlach et al. in view of Eryilmaz as applied to claim 1 above, 
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and further in view of Rezayee et ai ("An Adaptive KLT Approach for Speech 
Enhancement"). 

As to claim 14, Kushner et al. in view of Durlach et al. in view of Eryilmaz teach 
all of the limitations as in claim 1, above. 

However, Kushner ef a/, in view of Durlach et al, in view of Eryilmaz do 
not specifically teach a color noise elimination unit for eliminating color noise 
from voice. 

Rezayee et al. teaches the enhancement of speech from colored noise 
(see Abstract). 

It would have been obvious to one of ordinary skilled in the art to have 
combined the voice recognition as taught by Kushner et al. in view of Durlach et 
al. in view of Eryilmaz with the inclusion of a color noise eliminator as taught by 
Rezayee ef al. the motivation to have combined the references is since colored 
noise consist of various noise variances and is not the same as white noise, 
which has same variance (see Rezayee et al. page 87, right column, 3^^ 
paragraph, lines 12-17). 

« 

12. Claims 15 and 31 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Kushner et al. in view of Durlach ef al. in view qf Eryilmaz in view of Pastor in view 
of Chong-White ef al. (US 7,065,485) as applied to claims 10 and 27, above and further 
in view of Rezayee ef al. ("An Adaptive KLT Approach for Speech Enhancement"). 
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As to claims 15 and 31, Kushner et al. in view of Durlach et al. in view of 
Eryilmaz in view of Pastor in view of Chong-White et al, teach all of the limitations as in 
claim 1 , above. 

However, Kushner et al. in view of Durlach et al, in view of Eryilmaz in 

■ 

view of Pastor in view of Chong-White et al. do not specifically teach a color 
noise elimination unit for eliminating color noise from voice. 

Rezayee et al. teaches the enhancement of speech from colored noise 
(see Abstract). 

It would have been obvious to one of ordinary skilled in the art to have 
combined the voice recognition as taught by Kushner et al. in view of Durlach et 
al. in view of Eryilmaz in view of Chong-White with the inclusion of a color noise 
eliminator as taught by Rezayee et al. the motivation to have combined the 
references is since colored noise consist of various noise variances and is not 
the same as white noise, which has same variance (see Rezayee et al. page 87, 
right column, 3^^ paragraph, lines 12-17). Furthermore, it should be noted that the 
following elimination of colored noise is being done when speech is present. 
Hence, the detection of a vocal frame will entail speech is present and further 
enhance the signal from colored noise. 



s 
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Allowable Subject Matter 

t 

13. Claims 6,16, 17, 23, 32, and 33 are objected to as being dependent upon a 
rejected base claim, but would be allowable if rewritten in independent form including all 
of the limitations of the base claim and any intervening claims. 

14. The following is a statement of reasons for the indication of allowable subject 
matter: None of the prior art alone or in combination teach the following limitations: 
NR=R/n, as recited tin claims 6 and 23; "color noise ... obtained... amount of reduction 
in the random parameter.,. due to color noise" as recited in claims 16,17, 32, and 33. 

Conclusion 

1 5. Any inquiry concerning this communication or earlier communications from the ' 
examiner should be directed to Paras Shah whose telephone number is (571)270-1650. 
The examiner can normally be reached on MON.-THURS. 7:00a.m.-4:00p.m. EST. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on (571)272-7603. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 



published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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